Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2024-03-22 06:55:42

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wager, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
https://arxiv.org/abs/2403.14438

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from the audio waveform. Second, we take the decoder outputs of an automatic speech recognition (ASR) syst…

Tootfinder

Opt-in global Mastodon full text search. Join the index!